ECPR

Install the app

Install this application on your home screen for quick and easy access when you’re on the go.

Just tap Share then “Add to Home Screen”

Basket

You don't have anything in your basket.

Your subscription could not be saved. Please try again.
Your subscription to the ECPR Methods School offers and updates newsletter has been successful.

Discover ECPR's Latest Methods Course Offerings

We use Brevo as our email marketing platform. By clicking below to submit this form, you acknowledge that the information you provided will be transferred to Brevo for processing in accordance with their terms of use.

virtual

Introduction to Big Data Collection and Analysis in the Social Sciences

Member rate £552.50
Non-Member rate £1105.00

Save £45 Loyalty discount applied automatically*
Save 5% on each additional course booked

* If you attended our Methods School during the calendar years 2024 or 2025, you qualify for £45 off your course fee.

Course Dates and Times

Date: Monday 16  – Friday 20 February 2026
Time: 17:00 – 20:00 CET 

Daria Dementeva

daria.dementeva@kuleuven.be

KU Leuven

 Save £55 with our Early Bird offer if you register by 5 January 2026. No code required.

This course is an intensive, hands-on introduction to the fundamental concepts, methods, and techniques in collecting and analysing big data for applied social scientific research. The course is fast-paced and requires a substantive commitment, a pro-active attitude, and focused engagement.

The primary goal of this course is to prepare you with a technical toolkit for addressing social scientific research questions with an innovative methodological outlook. This requires being familiar with the range of big data sources and types, big data collection methods and approaches, as well as the technical details of data wrangling methodologies and statistical analyses suitable for social scientific research questions employing big data. The course will be instrumental in helping you reflect on the big data-based analytical pipeline as a whole, from formulating a research question to pinpointing appropriate data sources and methods to communicating findings.

Learning Outcomes

By the end of the course, it is expected that you should (be able to):

  • Be familiar with the concept of big data, the types of big data, its data sources;
  • Be aware of the big data quality, big data documentation practices and related ethical concerns;
  • Program in R;
  • Link machine learning methods to relevant social science questions;
  • Implement and interpret supervised machine learning methods for social science applications in R;
  • Implement and interpret unsupervised machine learning methods for social science applications in R;
  • Link text mining methods to relevant social science questions;
  • Process textual social scientific data;
  • Implement and interpret text analysis methods for social science applications in R;
  • Link web scraping methods to relevant social science questions;
  • Collect data from the web using basics of web scraping and APIs in R;
  • Position LLMs as innovative analytical tools and data sources in social science research;
  • Set big data-based analytical pipelines in R for your own social science application.
ECTS Credits

4 credits - Engage fully in class activities and complete a post-class assignment


Instructor Bio

Daria Dementeva is a Research Associate in Geospatial AI at the Luxembourg Institute for Socio-Economic Research (LISER), in the Urban Development and Mobility department. Before joining LISER in 2025, Daria obtained an MSc in Statistics and Data Science from KU Leuven, Belgium (2021). She then worked as a PhD researcher (2021-2025) in Social Data Science at the Institute for Social and Political Opinion Research, Center for Sociological Research at KU Leuven. During her PhD, her research focused on integrating big geodata and machine learning into public opinion research, with applications to interethnic group relations in Belgium. Additionally, at KU Leuven, she taught and co-authored an in-depth course on big data and machine learning, with a specific emphasis on hands-on social scientific applications. More information about her work is available on ORCID.

Provisional Daily Schedule

Day 1: Conceptual introduction to big data
  • Big data and computational social science: concepts and definitions
  • Types of big data for social science: taxonomy and data sources
  • Opportunities of big data for social sciences: current applications
  • Limitations of big data for social sciences: data quality, data documentation, ethics, and privacy
Day 2: Introduction to machine learning for social scientific applications
  • Supervised machine learning: definitions, algorithms, and social scientific applications
  • Unsupervised machine learning: definitions, algorithms, and social scientific applications
  • Hands-on lab 1: supervised and unsupervised machine learning in R
Day 3: Introduction to text mining for social scientific applications
  • Text as data
  • Text analysis methods: models and social scientific applications
  • Hands-on lab 2: text mining in R
Day 4: Introduction to basic web scraping
  • Web scraping I: current social scientific applications
  • Web scraping II: APIs, scraping pipelines, and data wrangling
  • Gands-on lab 3: basic web scraping in R
Day 5: Introduction to large language models (LLMs) in social sciences
  • LLMs: definitions and methods
  • LLMs as research tools: current social scientific applications
  • LLMs as a new data source: use-cases
  • Q&A wrap-up Session

 

Prerequisite Knowledge

This course uses the statistical programming language R. You are expected to have a basic command of R for data management and analysis. The course explores a variety of big data and statistical applications within the social sciences and related fields. You are expected to have a basic understanding of exploratory univariate and bivariate statistics and to be familiar with standard regression techniques. Please contact the instructor for guidance if you have no experience with R. An introductory R lab and/or Q&A session may be organised on demand.

Mode of Attendance and Participation Policy

This class follows a live format, which means you are expected to attend sessions in real time online. The cornerstone of your learning experience will be the daily live teaching sessions, totaling three hours per day across the five days of the course.

Class sessions are conducted using a flipped classroom approach. There is a recommended set of preparatory readings and short R coding exercises for each session. While completing these readings and exercises is not mandatory, doing so will enhance your learning experience during class and help you follow the R labs more effectively.

Each class session is divided into two parts. The first part consits of a brief but comprehensive lecture, and the second part is devoted to interactive R labs. During the labs, the instructor will demonstrate R coding, and you are encouraged to either code along with the instructor or follow the demonstrations. You are also encouraged to ask questions about both the coding and lecture-related activities. Class sessions can be optionally recorded upon request.

During the course week, you are expected to dedicate approximately two to three hours per day to preparing for classes, completing the daily readings, and working on the R labs. For an enhanced experience, your learning commitment should ideally extend beyond these sessions.

Class Community and Communication

To enhance the class community, we will use designated software (TBA) as a discussion forum to engage with fellow students and instructors and to facilitate discussion. You are encouraged to ask questions, share their conceptual or methodological concerns, request advice, clarification, or additional materials publicly so that others can see and contribute whenever possible.

The instructor is also available for one-on-one meetings outside of teaching hours. Please email to schedule an online appointment.

Class Conduct

All participants are expected to engage respectfully with one another. We aim to foster an open, collaborative, welcoming, supportive, and inclusive learning environment for everyone, irrespective of their sociodemographic or socioeconomic backgrounds, religious beliefs, sexual orientation, citizenship, language, physical appearance, and other visible or invisible traits.

Learning Management System

Upon payment and registration for the course, you will gain access to our Learning Management System (LMS) approximately two weeks before the course start date. Here, you will have access to course materials such as pre-course readings. The time commitment required to familiarise yourself with the content and complete any pre-course tasks is estimated to be approximately 20 hours per week leading up to the start date.

ECTS credits

Each course offers the opportunity to be awarded three ECTS credits. Should you wish to earn a 4th credit, you will need to complete a post-course assignment, which will involve approximately 25 hours of work.

Disclaimer

This course description may be subject to subsequent adaptations (e.g. taking into account new developments in the field, participant demands, group size, etc.). Registered participants will be informed at the time of change. By registering for this course, you confirm that you possess the knowledge required to follow it.